Model Selection

Variational autoencoder

# Variational autoencoder

Pulaski ProbUNet3D Base VSeg

PULASki is a computationally efficient biomedical image segmentation tool that accurately captures variability in expert annotations, particularly suitable for small datasets and class imbalance issues.

Image Segmentation

Nepali male voice synthesis model based on VITS architecture, supporting high-quality text-to-speech functionality

Speech Synthesis

Transformers Other

VITS is an end-to-end text-to-speech model based on adversarial learning and conditional variational autoencoder, supporting Chinese speech synthesis.

Speech Synthesis

Transformers Chinese

Marshallese text-to-speech model developed by Meta, using VITS end-to-end architecture to support high-quality speech synthesis

Speech Synthesis

A Loluo language (llg) text-to-speech model developed by Meta, which is part of the Massive Multilingual Speech project

Speech Synthesis

The Lampung Api text-to-speech model developed by Meta, which is part of the MMS multilingual speech project

Speech Synthesis

A Chin and Bam text-to-speech model developed by Meta, part of the Massively Multilingual Speech (MMS) project.

Speech Synthesis

Khmer text-to-speech model from Facebook's MMS project, implemented with VITS architecture for end-to-end speech synthesis

Speech Synthesis

Eastern Punjabi text-to-speech model developed by Facebook, based on VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Pangasinan text-to-speech model developed by Meta, based on VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Garhwali text-to-speech model developed by Meta, supporting high-quality speech synthesis

Speech Synthesis

Ilocano text-to-speech model developed by Meta, based on VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Swahili text-to-speech model developed by Meta, based on VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Malayalam text-to-speech model in Facebook's MMS project, implementing end-to-end speech synthesis based on VITS architecture

Speech Synthesis

Haitian Creole text-to-speech model developed by Meta, part of the Massively Multilingual Speech (MMS) project

Speech Synthesis

VITS is an end-to-end speech synthesis model capable of predicting corresponding speech waveforms from input text sequences. The model employs a conditional variational autoencoder (VAE) architecture, including a posterior encoder, decoder, and conditional prior module.

Speech Synthesis

kakao-enterprise

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase